4. Isomorphism

(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev

Installation instructions of CGRtools package information and tutorial's files see on https://github.com/cimm-kzn/CGRtools

NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results.


In [ ]:
import pkg_resources
if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:
    print('WARNING. Tutorial was tested on 3.1 version of CGRtools')
else:
    print('Welcome!')

In [ ]:
# load data for tutorial
from pickle import load
from traceback import format_exc

with open('molecules.dat', 'rb') as f:
    molecules = load(f) # list of MoleculeContainer objects
with open('reactions.dat', 'rb') as f:
    reactions = load(f) # list of ReactionContainer objects

m2, m3 = molecules[1:3] # molecule
m7 = m3.copy()
m7.standardize()
r1 = reactions[0] # reaction
m5, m6 = r1.reactants[:2]
m8 = m7.substructure([4, 5, 6, 7, 8, 9], as_view=False)
m9 = m6.substructure([5, 6,7, 8], as_view=False) # acid
m10 =  r1.products[0].copy()

benzene = m3.substructure([4,5,6,7,8,9], as_view=False) 
cgr1 = m7 ^ m8 
cgr1.reset_query_marks() 
carb = m10.substructure([5,7,8, 2])
m2.reset_query_marks()

from CGRtools.containers import *
from CGRtools import CGRpreparer
preparer = CGRpreparer()

4.1. Molecules Isomorphism

CGRtools has simple substructure/structure isomorphism API. In backend VF2 algorithm from NetworkX library is used.

Note, that atoms are matched in subgraph isomorphism only if they have same charge/multiplicity and isotope options.


In [ ]:
m7

In [ ]:
m8

In [ ]:
benzene.standardize()
benzene

In [ ]:
# isomorphism operations
print(benzene < m7)  # benzene is substructure of m7
print(benzene > m7)  # benzene is not superstructure of m7
print(benzene <= m7) # benzene is substructure/or same structure of m7
print(benzene >= m7) # benzene is not superstructure/or same structure of m7
print(benzene < m8) # benzene is not substructure of m8. it's equal
print(benzene <= m8)

In [ ]:
m5

In [ ]:
m6

Mappings of substructure to structure can be returned using substructure.get_substructure_mapping(structure, limit=1) method. Argument limit is the number of mappings that one wants to be returned, limit=0 means to return all possible mappings. Method acts as generator.

To get mapping upon structure search structure1.get_mapping(structure2) method was developed. It returns only one possible mapping of all atoms for two isomorphic molecules. This functionality was developed to reorder atoms of two MoleculeContainers in the same order (the dictionary that is given by this method could be directly fed to remap function, see above) for some reaction handling issues. If molecules are isomorphic it works faster than get_substructure_mapping.


In [ ]:
m5.get_substructure_mapping(m6)  # mapping of m5 substructure into m2 superstructure

In [ ]:
for m in m5.get_substructure_mapping(m6, limit=0):  # iterate over all possible substructure mappings
    print(m)

In [ ]:
benzene.get_mapping(m8)  # mapping of benzene into m8 - also benzene.

4.2. Reactions

ReactionContainers do not support isomorphism due to ambiguity. But molecules in reaction can be matched.


In [ ]:
try:            # it is not possible to match molecule and reaction. Error is returned
    m6 < r1
except TypeError:
    print(format_exc())

In [ ]:
r1.products[0] # see structure in products

In [ ]:
m6 # substructure used. One can see, they should not match

In [ ]:
any(m6 < m for m in r1.products) # check if any molecule from product side has m6 as substructure

4.3 CGR

Substructure search is possible with CGRContainer. API is the same as for molecules.

Matching CGR into CGR and molecule into CGR is possible. Note that only conventional bonds in CGR could match moleculear bonds.

Equal atoms in isomorphism is atoms with same charge/multiplicity and isotope numbers in reactant and product states


In [ ]:
decomposed1 = preparer.decompose(cgr1) # let's have a look at reaction corresponding to cgr1
decomposed1

In [ ]:
m8 # this's the substructure we are looking for

In [ ]:
m8 < cgr1

In [ ]:
cgr1 <= cgr1

4.4 Queries


In [ ]:
# to use QueryContainers neighbors and hybridization for molecules need to be calculated
m9.reset_query_marks()
m10.reset_query_marks()

In [ ]:
m9 # acid

In [ ]:
m10 # ether

In [ ]:
carb

In [ ]:
print('m9:', f'{m9:hn}') # all labels were calculated
print('m10:', f'{m10:hn}')
print('carb:', f'{carb:hn}') # notice that one of oxygen atom has 2 neighbors. Only ester could fit this restriction.

Molecules isomorphism don't take into account neighbors and hybridization


In [ ]:
carb < m9 # carb currently is molecule projection. It fit this molecule as well.

In [ ]:
carb < m10 # carb is a substructure of m10

One need to convert molecule (or it's projection) into QueryContainer object. In this case number of neighbors and hybridization data will be taken into account upon substructure search.

API of isomorphism is the same.


In [ ]:
q = QueryContainer(carb)  # convert molecule into query
print(q)     # now one can see that in signature of QueryContainer. See that one of oxygen has 2 neighbors.

In [ ]:
q < m9 # now neighbors and hybridization are taken into account.

Acid m9 has hydroxyl group with one non-hydrogen neighbor. Our query requires existence of one oxygen atom with two non-hydrogen neighbors.


In [ ]:
q < m10 # ester matches to query.

In [ ]:
m2.reset_query_marks()
m2

In [ ]:
q < m2 # this molecule does q as substructure as well. It is acid.